Screening for Autism Spectrum Disorder in Young Children: Still Not Enough Evidence

Background: Early detection of autism spectrum disorder (ASD) has the potential to significantly reduce the impact of the condition, however previous reviews have found little evidence to support screening programs for ASD in young children. Methods: We conducted a review with the aim of updating evidence on 3 aspects: (a) diagnostic stability of ASD in young children; (b) accuracy of ASD screening tools in young children; and (c) the benefits of early interventions in screen-detected young children with ASD. Results: A total of 33 studies were included in our review. Five studies looking at diagnostic stability reported estimates ranging from 71.9% to 100%, however the majority only included a follow-up of 24 months and all studies raised concerns regarding the risk of bias due particularly to lack of blinding, sample size, and patient flow. A total of 25 studies, reported in 26 articles, were identified that reported accuracy data on 11 screening tools. Most of the reports were concerned with versions of M-CHAT, reporting sensitivity estimates from 0.67 to 1.0; however, many of these were deemed to be of high risk of bias due to lack of blinding and follow-up. Four studies reported on early interventions in screen-detected children; however, the majority did not find significant improvements on the relevant outcomes. Conclusions: Overall, the evidence on screening for ASD in young children captured by this review is not conclusive regarding the 3 aspects of screening in this population. Future studies should attempt to ensure blinded diagnostic assessments, include longer follow-up periods and limit attrition.


Background
Autism spectrum disorder (ASD) describes a range of of neurodevelopmental disorders, 1 characterized by persistent and significant impairments in social interaction and communication, and varying degrees of restrictive and repetitive behaviors. 1Evidence so far suggests that in more than 70% of individuals with ASD, there are other coexisting health, disability, or neurodevelopmental conditions present. 2,3In terms of ASD prevalence, estimates differ greatly across the world, 4 due to varying techniques to identify cases, impact of sampling biases, and the cultural context. 5A recent study estimates that on average, around 1 in 100 children is diagnosed with ASD globally. 3In England, the prevalence of ASD in school children is estimated at 1.76% (95%CI 1.75%, 1.77%). 6While definitions of the disease, as well as instruments for detecting it have changed over the years, studies of prevalence 3,7,8 seem to suggest it is increasing; it is unclear, however, whether this is due solely to increased recognition, shifts in definition, or partially due to increased underlying risk factors.
ASD is highly heterogeneous.0][11] It is expected that young children diagnosed with ASD can potentially receive targeted interventions before developmental plasticity is lost, to foster their improved communication skills, which will provide an advantage later in life. 12creening for ASD in young children has been considered. 13,14Because screening is an emotional experience for parents and potentially stressful for children, it is important to avoid an approach that will result in many false positives and negatives, as the consequences and emotional impact of being told a child has autism are likely to be significant. 15,16t is also possible that children with slow development as toddlers may improve to similar levels as their peers when older, and some evidence suggests that even children who meet ASD criteria at very young ages may improve to subclinical levels later on. 17,18Thus, screening might be inappropriate if carried out too early as it might result in false positive results due to an overtly unstable diagnosis.These considerations emphasize the importance of high sensitivity, specificity, and positive predictive values of the screening test required to justify ASD screening at young ages, as well as of the need to establish the stability of diagnosis into later childhood.
The effectiveness of early interventions is also crucial for reducing symptoms of ASD and in improving young children's life chances.Most interventions for ASD are behavioral, and as such are costly and time-consuming.They are almost always parent-mediated, and mothers (predominantly the primary carers) are disproportionately tasked with significant roles in the process, 19 leading to a potential loss of career and other opportunities.It is crucial that there is strong evidence to support the use of parent mediated behavioral interventions, given the intense effort involved.However, the evidence base is limited so far, and the effectiveness of early interventions has hitherto remained unclear. 13n 2011, the UK National Screening Committee (NSC), 13 and in 2016 the United States Preventive Services Taskforce (USPSTF), 14 recommended that screening should not be carried out in asymptomatic young children.Both recommendations highlighted concerns around the acceptability of screening for ASD, a lack of evidence for the benefits and harms of screening and limited evidence on the effectiveness of early interventions. 13,14e were commissioned to update evidence informing the UK NSC recommendation on screening for ASD in young children, specifically to address the following questions: Question 1: What is the diagnostic stability of ASD in children aged under 5 years?Question 2: What is the accuracy of screening tools in children under the age of 5 to identify ASD? Are there characteristics (such as the age at which the screening test is performed) that affect accuracy?Question 3: Has the benefit of early intervention in children aged 5 years and younger, detected through screening been demonstrated?

Methods
The review protocol was registered on PROSPERO (CRD42021231868).All 3 questions were covered by a single search strategy using a combination of free-text and medical subject headings (see Additional file 1).The search was carried out on MEDLINE (via OvidSp), EMBASE (via OvidSp), CINAHL (via EBSCOhost) and PsycINFO (via OvidSp), Cochrane Database of Systematic Reviews and CENTRAL, and Clinical trials.gov.All searches were limited to articles published since 2010.Databases were searched in November 2021; an update search was conducted in August 2022.Reference lists of included studies were checked for other relevant publications.
After removing duplicates across databases, titles/ abstracts, and subsequent full-texts, were reviewed against the inclusion/exclusion criteria (Table 1) by a single reviewer (either RH, BG, JW, or JP).Twenty percent of titles and abstracts were double screened.Any disagreements were resolved by discussion.Where the eligibility of an article was unclear at title and abstract screening, it was included for full-text screening to ensure that all potentially relevant studies were captured.
A bespoke data extraction form was developed and piloted for each question.All data was extracted by 1 reviewer for each question, then checked by a second reviewer.Discrepancies were resolved through discussion or mediated by a third reviewer.Tools to assess the quality and risk of bias of each study included in the review were applied depending on the design of each study.We used a modification of QUIPS (the Quality in Prognostic Studies), 20 modified versions of QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies), 21 and QUADAS-C (for comparative accuracy studies), 22 AMSTAR (A MeaSurement Tool to Assess systematic Reviews), 23 the Cochrane Collaboration's "Risk of Bias" Tool, 24 and ROBINS-I (Risk of Bias in Non-randomized Studies-Interventions). 25 A narrative synthesis of results is presented for each question.Where summary estimates have not been reported in studies, but raw data are available, these have been calculated.

Results
Figure 1 details the number of articles identified, screened, and ultimately included in the systematic review.A total of 11 863 titles and abstracts were identified from the database searches.After de-duplication of articles from the different databases, the titles and abstracts of 6944 articles were screened using the inclusion and exclusion criteria.Two hundred and eighty-three articles were then screened at full-text,   with 30 meeting our inclusion criteria.Three further articles were obtained from screening the references of those 30 articles or other sources.Thirty-three publications were ultimately judged to be relevant to 1 or more review questions (see Figure 1).

Q1: Diagnostic Stability
Five primary studies were relevant to Q1.These were published between 2013 and 2021 and included 1580 children in total, with a mean age range between 19 and 36 months at baseline.Two studies were based in the USA, 26,27 and 1 each in the UK, 28 Australia, 29 and Sweden. 30The time interval between diagnosis and final follow up assessment was approximately 24 months, except for the Swedish study (Spjut Jansson, 2016) 30 which was 60 months.Follow-up assessments included less than 100 children in all studies except for Pierce, 26 who followed up over 1200 children.In 4 of the 5 studies, all children meeting their inclusion criteria, regardless of whether their initial diagnosis was ASD, were followed up.Spjut Jansson 30 is the exception to this, as only children diagnosed with ASD were followed-up.A summary of study characteristics, risk of bias, and results is presented in Table 2.In the studies providing estimates on the stability of a diagnosis of ASD over time in a screened population, these ranged from 71.9% to 100%.All studies raised concerns regarding risk of bias, including the lack of blinding of assessments at follow-up, participant attrition, clearly described methods of diagnosis, and a relatively small number of children evaluated at each time-point.In 1 study the population of screen-detected cases was mixed with clinically referred cases of ASD.Tt was unclear whether children received treatment during follow-up in some of the studies.Furthermore, 1 study which reported 100% stability, 27 allowed for diagnosis to be deferred, suggesting that this estimate of 100% stability may not reflect the more difficult diagnoses.In fact, at follow-up (T2), 71% of those with a deferred diagnosis at baseline (T1) had been deemed as not having a diagnosis of ASD.The length of follow-up, which was 24 months for all but 1 study, 27 are another limitation of the findings from these studies.

Q2: Screening Accuracy Studies
Twenty-five primary studies were relevant for this question, including 1 study that was reported in 2 articles. 31,32he included studies evaluated the performance of 11 screening tools.These tools and relevant findings are summarized in Table 3.
All 25 studies report screening for ASD in a community based population; in many cases, screening was part of routine surveillance appointments.Six studies were based in the USA, 4 in Turkey, 2 each in Israel, Spain, and Australia, and 1 study in each of the following: the UK, Chile, China, Iceland, Japan, Italy, France, Sweden, and Nepal.
The age of screened children was between the of 12 and 36 months in most (18) of the studies.Mozolic-Staunton et al 40 reported screening children at the ages of 12, 18, 24, and 36 to 60 months, while Catino et al 44 screened children at 42 and 48 months old.Most of the studies were prospective cohort studies; Achenie et al, 48 Dai et al, 49 and Mozolic-Staunton et al 40 reported re-analyses of previous cohort studies.Achenie et al 48 conducted a retrospective analysis of data prospectively collected by Robins et al. 33 The studies were generally found to be of low risk of bias on QUADAS-2 for the patient selection, index test, and reference standard domains.A number of studies were carried out with the aim of validating the M-CHAT(R/F) screening tool in populations where English is not the first language; this required the translation of the M-CHAT(R/F).Timing and patient flow were domains on which many studies were deemed to be at a high risk of bias.Design choices were generally justified by the complexity of the diagnostic evaluation for ASD, making it time and resource intensive.Consequently, children who were deemed screennegative either did not have a diagnostic evaluation, 36,40,42,44,[49][50][51] only had an evaluation if they were screen-positive on another tool(s) and/or a health professional raised a concern for possible ASD diagnosis, 28,33,38,41,[46][47][48][52][53][54] had their medical records checked for a diagnosis of ASD at a later date (after 10 months, 39 after 24 months, 31,32 or at an unknown time-point 55 ). In 4 tudies, a sample of children who were deemed to be screen-negative received a diagnostic evaluation, 37,43,45,56 while Suren et al 57 took a number of different approaches.Although a practical constraint, only offering a diagnostic evaluation to those children who screen positive can lead to biased accuracy estimates (depending on the approach taken and the calculation and reporting of summary accuracy measures).One particular challenge of these study designs was blinding the diagnostic assessment to the screening result, as only screen-positive children received a diagnostic evaluation.
Where estimates of sensitivity were reported, those for M-CHAT(R/F) ranged from 0.67 to 1.0, with many studies reporting sensitivity estimates of around 0.8 depending on age group or cut-off used (see Table 4).
The sensitivity estimates for other tools spanned the range of possible estimates from 0.07 (0.03, 0.14) for PEDS 40 to 1.0 (0.72, 1.0) for Q-CHAT. 28Where only PPVs could be estimated with confidence (ie, ASD diagnoses in screen negative children was not undertaken or assumed), these ranged from <0.10 (for the ASQ 44 and M-CHAT-R 38 ) to ≥0.80 for TIDOS, 43 M-CHAT/F, and/or JA-OBS 47 (see Table 5).
The comparative accuracy analyses between M-CHAT/F and other screening tools suggest that tools incorporating observation of the child (TIDOS and JA-OBS) tended to perform better than parent/carer reported questionnaires such as M-CHAT/F (see Tables 4 and 5).Beside non-M-CHAT(-R/F), TIDOS and JA-OBS were the only other tools with estimates of sensitivity above 0.5: 0.8 (0.28, 0.99) for TIDOS 43 and 0.86 (0.72, 0.95) for JA-OBS. 47ittle evidence was found on whether age or other characteristics impact on screening accuracy. 44,55,57n 8 studies where screening uptake could be extracted or calculated, estimates ranged from 40% to 88.7% (see Tables 4 and 5).Where the screening tool used involved 2 stages (ie, M-CHAT(-R)/F), 6 studies reported the proportion of patients with complete screening information, which ranged from 70% to 84%.Fourteen studies reported uptake for the diagnostic evaluation of screen-positive, ranging from 57.5% to 100%.Canal-Bedia et al, 51 reported an uptake of the diagnostic evaluation of only 9.2%.Two studies reported 100% uptake, however both studies included fewer than 20 children.

Q3: Early Interventions
114 titles were reviewed in full text for eligibility on this question.Following full-text screening, only 3 studies were Thabtah2019, 35 Ozgur 2020, 36   Sturner 2022, 37 Zhang    eligible: 2 USA-based RCTs 58,59 and a prospective cohort study conducted in Sweden. 30A fourth study, conducted in Australia, was identified from other sources. 60tudy characteristics, risk of bias, and results reported in the 4 included studies are summarized in Table 6.
Two RCTs 58,59 included children from a community population in USA, who were found to be at risk of ASD according to the First Year Inventory (FYI) tool.In Baranek et al, 58 children who had not screened positive on FYI, but had concerns raised by their parents, were also included.In both studies, children deemed to be at-risk of ASD were randomized to either receive a parent-led intervention (Adapted Responsive Teaching, ART) lasting 24 weeks, or were referred to existing services in the local communities (referral to early intervention and monitoring, REIM).ART was described as a homebased, relationship-focussed intervention that encouraged parents "to use responsive strategies during daily routines with their children, [. ..] designed to target "pivotal" behaviors (eg, social play, joint attention, arousal and attention, engagement, adaptability, and coping)." 58In both studies, the intervention was provided over 6 months, and included 36 planned contacts (mainly home sessions, with additional phone calls and emails) between parents and professionals experienced in child development.Participants were identified from relatively large community populations (2261 and, respectively, 8709 screen results were available); despite this, studies included relatively low samples (16 and 87 respectively).Contacted authors confirmed that the 2 studies used different samples, independent of each other.
One study 58 found ART was significantly associated with improved receptive language, socialization, and sensory hyporesponsiveness compared to REIM.However, the larger RCT 59 found no evidence that ART was associated with improved outcomes compared to REIM.
The Swedish prospective cohort study 30 included children aged 2 and a half years who were referred to the Child Neuropsychiatric Clinic following a positive screen result from routine ASD screening.Children received a wide range of interventions, varied in type and intensity.Evaluation after 2 years did not show any significant differences between the interventions on the Vineland Adaptive Behavior Scale, Second Edition (VABS-II) 61 and the Children's Global Assessment Scale (C-GAS). 62hitehouse et al 60 included children aged 12 months who were referred mostly due to a positive result following community wide screening.Children were randomized to either the iBASIS-Video Interaction to Promote Positive Parenting (iBASIS-VIPP) intervention or usual care.Data showed a reduction in ASD symptom severity (ABC, −5.53; 95%CI, −∞ to −0.28; P = .04)and reduced odds of ASD classification (odds ratio, 0.18; 95%CI, 0-0.68;P = .02)in the intervention group, 2 years after baseline.
The quality of the included studies was generally acceptable, with some concerns due to the lack of blinding to the intervention type for the RCTs.Spjut Jansson et al 30 presented the least overall concern of bias, however the study was not randomized.The population included in Whitehouse et al 60 represented a mix of screened and referred participants.When contacted for clarification, authors of the study confirmed that the majority of the included children had a positive screen result, with only a minority in 1 trial site being referred beside the screened participants.

Statement of Principal Findings
Overall, the evidence on screening for ASD in young children remains weak.
Regarding diagnostic stability, although studies indicated that between 72% and 100% of children with a diagnosis of ASD retained their diagnosis 2 years later, the risk of bias within these studies, in particular the lack of blinding, means that diagnostic stability is likely to be over-estimated.Additionally, there is little evidence of the stability of ASD diagnoses beyond the age of 4 or 5 years.
Regarding the accuracy of screening, this review indicates uncertainty as to the performance and uptake of screening tools to identify children with ASD.Most studies evaluated adaptations of the M-CHAT screening tool in other languages.In these studies, estimates of sensitivity for M-CHAT(R/F) ranged from 0.67 to 1.There was some evidence that tools which included observation of the child by professionals may have better accuracy than the M-CHAT(R/F).Screening uptake was also variable, ranging from 40% to 89%.However, the main limiting factors are uncertainty on the stability of diagnoses of ASD when made at young ages, and the current lack of evidence regarding the effectiveness of interventions for children identified through screening.
In terms of the effectiveness of interventions, only 4 studies evaluated interventions in young children identified through screening for ASD, the largest of which still only included 89 children.This study reported the treatment effect (reduced ASD severity) was maintained after 2 years.However, the study sample included a small number of referred patients beside the screen-detected majority, making the generalizability of these results unclear.There was no evidence of improved outcomes in the other studies.Overall, evidence of the long-term outcomes of early intervention in young children identified through screening remains limited, as the maximum follow-up among the studies identified was just 2 years.

Strengths and Weaknesses of the Study
Particular aspects of study design limited many of the studies included in this review.For instance, a lack of blinding limited the interpretation of most of the studies that evaluated diagnostic stability and many of the screening accuracy studies.Short follow-up periods limited the extent to which diagnoses can be said to be stable beyond 2 years after diagnosis, and interventions effective after 2 years.A particular limitation of many of the screening accuracy studies is to what extent, and how, screen-negative children were followed-up, so that reliable estimates of sensitivity and specificity estimates could be obtained.This might entail testing a random sample of children who screened negative, or a later assessment of medical records for ASD diagnosis in children who screened negative (though such an approach also carries risks of bias).
Our research question focussed on children who were asymptomatic for ASD, therefore we are not able to comment on the ability of these screening tools in children who may have symptoms for ASD, nor offer insight on the types of symptoms, or characteristics, that might prompt screening for ASD.Other research has looked at the use of such screening tools in children at higher risk of ASD, such as the younger siblings of children diagnosed with ASD. 63here are a number of potential benefits from screening for ASD in young children which were not included in our systematic review, such as increasing social equity from systematic screening across populations, alleviating concern in parents following a diagnosis that would not have been identified otherwise.

Strengths and Weaknesses in Relation to Other Studies
Both the 2011 UK NSC evidence review and 2016 USPSTF review included RCTs that evaluated interventions for children ≤5 years old diagnosed with ASD, but none of these were in children who had been identified through screening for ASD.In contrast, children included in these RCTs had "significant impairments in cognition, language, and behavior," 14 suggesting more severe symptoms than screen detected ones; they were also outside the intended age range of the screening tools.Although evidence did suggest that interventions could be effective in children with ASD, there remains significant uncertainty regarding the applicability of these findings in a younger population of screen-detected children.
Our finding that tools which include observation of children may perform better than parent/carer-completed checklists confirms some of the findings from the 2011 UK NSC review, where surveillance of children by trained professionals was associated with high sensitivity (0.94 for children aged 3.5 years). 13However, use of observational screening tools compared to parent/carer-completed screening questionnaires has considerable implications for available resources.
More recent systematic reviews of the accuracy of screening tools for ASD in children have reached similar conclusions on the variability of results and limitations in study design.For example, in their systematic review on the accuracy of screening tools for ASD in children up to 12 years of age, Levy et al 64 highlight variability in the performance of even the same tool across settings, ages, and contexts.A review of screening tools in children <2 years old by Petrocchi et al 65 reports similar conclusions on the variability of accuracy estimates across studies, and calls for more studies evaluating tools in the general population "with the purpose for making a diagnostic evaluation."Although reporting that their meta-analysis results indicate accuracy of screening tools for children aged 14-36 months, Sanchez-Garcia point out that over half of the studies were deemed to be at high risk of bias, and comment on the challenge of following-up those children who have a negative screen result.

Unanswered Questions and Future Research
To reduce uncertainty in the evidence for ASD screening in young children, better designed studies are needed.Evidence on the stability of ASD diagnoses beyond 4 or 5 years old is lacking.Further studies on diagnostic stability should consider longer follow-up periods, so that the diagnostic assessment can be made at primary school age.Ideally, follow-up assessments should be blinded to previous assessments.To evaluate accuracy of screening tools, more studies should attempt to follow-up a proportion of children who screen negative, in order to improve the reliability of sensitivity and specificity estimates.Again, measures should be taken too blind the diagnostic evaluation to the screening results.Comparative accuracy of more than 1 tool would be useful, especially when questionnaire-based tools are included together with tools involving child observation.Research is also needed on factors influencing the uptake of ASD screening and diagnostic assessment.Another relevant aspect is resource use associated with the screening tools; current evidence suggest there is a trade-off, with parent/carer questionnaires requiring relatively few resources, while approaches featuring behavioral observation by professionals needing significantly more resources, if training of such professionals is taken into account.
Designing of the ideal study to evaluate early interventions in a screened population is challenging.While large studies are usually preferred, because they provide the power to accurately evaluate intervention effects.However, the low prevalence of ASD means that, in order to achieve the required power, the population to screen would need to be very large.
Beside the low prevalence of ASD, another problem is attrition.This is particularly important in screened populations, because of many ways in which participants could drop-out.Our review found that screening uptake ranged from 40% to 88%, and could be as low as 57.5% for the subsequent diagnostic evaluation of children at risk.These 2 issues are illustrated in the study by Watson et al: 59 more than 8700 children were screened in order to identify a sample of 102 participants needed to detect a statistically significant difference, but only 87 consented and were available to randomization.
Thus, screening of larger or multiple populations might be necessary to ensure adequate power for evaluating the effectiveness of the interventions.Measures to mitigate attrition should also be considered.Finally, longer followup would allow a better evaluation of long-term outcomes following early interventions.

Conclusions
ASD is highly heterogenous, and a wide range of cognitive, learning, language, medical, emotional and behavioral problems co-occur to variable degrees.Screening to identify ASD has been discussed for many years and, as this systematic review has shown, a number of screening tools are available.Although this update review identified new studies reporting on diagnostic stability of ASD, screening tools accuracy and early intervention effectiveness, the evidence remains unclear and insufficient to support ASD screening in young children.Better designed studies are needed to reduce uncertainty on the stability of ASD diagnoses, the accuracy of screening tools and the effectiveness of early interventions in children <5 years old for whom no concerns of ASD have been previously identified.
fine motor skills, language and communication, and contains an emotional-social domain 3skills, expressive/ receptive language development, social-emotional development, and concerns around school for those children attending /F, original M-CHAT with follow-up interview; M-CHAT-R/F, revised M-CHAT with follow-up interview.The M-CHAT/F was first published in 2001, and a revised version, M-CHAT-R/F published in 2014 33 .b Q-CHAT and M-CHAT(-R/F) are both based on a version of the CHAT.

.
Inclusion and Exclusion Criteria for the Key Questions.
Abbreviations: ASD, autism spectrum disorder; GP, general practitioner; HCP, health care professional; IQ, intelligence quotient; NICE, The National Institute for Health and Care Excellence; NPV, negative predictive value; PPV, positive predictive value.

Records excluded aŌer full- text review 247 Records from screening references of included papers 2 for Q2 No new records idenƟfied Records idenƟfied from other sources 1 for Q3 Figure 1. PRISMA diagram of study inclusion.
*

Table 2 .
Characteristics, Risk of Bias, and Results for Studies Addressing Diagnostic Stability in a Population Of Children Identified by Screening.
b 95% confidence intervals calculated by review authors.

Table 3 .
Summary of Screening Tools Evaluated in the Included Studies.

Table 4 .
Summary of Studies Evaluating the Screening Accuracy of Q-CHAT and Versions of M-CHAT.

Table 4 . (continued)
who only report PPV and who do not follow-up any children who have a negative screen are deemed to be of low risk of bias.However, if such a study reports sensitivity and specificity, then it is deemed to be of high risk of bias.
Abbreviations: ADOS-G, Autism Diagnosis Observation Schedule-Generic; BOT, Binomial Observation Test; CARS, Childhood autism rating scale; FT, flow and timing; FUI, follow-up interview for M-CHAT(-R); IT, index test (screening tool); NPV, negative predictive value; NR, not reported.;PPV, positive predictive value; PS, participant selection; RS, reference standard (diagnostic evaluation); Sens, sensitivity; Spec, specificity a Note that studies b Estimates calculated by review authors.

Table 4 . (continued)Table 5 .
Summary of Studies Evaluating the Screening Accuracy of Tools Other Than M-CHAT(-R/F) or in Combination With M-CHAT(-R/F).
a 95% confidence intervals calculated by review authors.

Table 6 .
Characteristics, Risk of Bias, and Results for Studies Evaluating the Effectiveness of Early Interventions.
Abbreviations: ART, adapted responsive teaching; C, confounding; FU, follow-up; FYI, First Year Inventory; I, intervention(s); MD, missing data; NA, not applicable; NR, not reported; OM, outcome measurement; P, participants; R, randomization; RB, bias in reporting outcomes; RCT, randomized controlled trial; REIM, referral to early intervention and monitoring.aCochrane Risk of Bias tool for RCTs, ROBINS-I for cohort study.